Systems for knowledge-intensive tasks such as open-domain question answering (QA) usually consist of two stages: efficient retrieval of relevant documents from a large corpus and detailed reading of the selected documents to generate answers. Retrievers and readers are usually modeled separately, which necessitates a cumbersome implementation and is hard to train and adapt in an end-to-end fashion. In this paper, we revisit this design and eschew the separate architecture and training in favor of a single Transformer that performs Retrieval as Attention (ReAtt), and end-to-end training solely based on supervision from the end QA task. We demonstrate for the first time that a single model trained end-to-end can achieve both competitive retrieval and QA performance, matching or slightly outperforming state-of-the-art separately trained retrievers and readers. Moreover, end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings, making our model a simple and adaptable solution for knowledge-intensive tasks. Code and models are available at https://github.com/jzbjyb/ReAtt.
translated by 谷歌翻译
迄今为止对文本生成的评估主要集中在依次创建的内容上,而不是对文本的改进。但是,写作自然是一个迭代和增量过程,需要在不同的模块化技能上进行专业知识,例如修复过时的信息或使样式更加一致。即便如此,对模型执行这些技能和编辑能力的模型能力的全面评估仍然很少。这项工作介绍了EditeVal:基于指导的,基准和评估套件,该套件利用现有的现有和新数据集自动评估编辑功能,例如使文本更具凝聚力和释义。我们评估了几种预训练的模型,这表明指令和同伴表现最好,但是大多数基准都落在监督的SOTA以下,尤其是在中和和更新信息时。我们的分析还表明,用于编辑任务的常用指标并不总是很好地关联,并且对具有最高性能的提示的优化并不一定带来对不同模型的最强鲁棒性。通过发布此基准和公开可用的排行榜挑战,我们希望在开发能够迭代和更可控制的编辑模型中解锁未来的研究。
translated by 谷歌翻译
文本内容通常是协作写作过程的输出:我们从初始草稿开始,提出建议并反复进行更改。不可知的是,当今的语言模型只能产生最终结果。结果,他们缺乏对协作写作至关重要的几种能力:他们无法更新现有文本,难以控制和无法进行口头计划或解释其行为。为了解决这些缺点,我们介绍了Peer,这是一种协作语言模型,经过训练以模仿整个写作过程本身:Peer可以编写草稿,添加建议,提出编辑并为其行为提供解释。至关重要的是,我们训练多个同伴能够填补写作过程的各个部分的实例,从而可以使用自训练技术来提高培训数据的质量,数量和多样性。这通过使其适用于没有编辑历史的域,并提高其遵循说明,编写有用的评论并解释其动作的能力,从而释放了Peer的全部潜力。我们表明,同行在各个领域和编辑任务上取得了强大的性能。
translated by 谷歌翻译
天然语言对代码模型学会生成具有自然语言(NL)意图的代码段。但是,由于每天引入新的库和功能,因此不可能使用培训示例来覆盖所有API的公开库和专有库和功能的快速增长。因此,现有模型本质上不能仅通过将它们纳入培训数据而概括地使用看不见的功能和库。相反,当人类程序员编写程序时,他们经常指文本资源,例如代码手册,文档和教程,以探索和理解可用的库功能。受此观察的启发,我们介绍了Doccoder:一种方法,该方法通过(1)检索给定NL意图的相关文档明确利用代码手册和文档,以及(2)基于NL意图和检索到的文档生成代码。我们的方法是一般的,可以应用于任何编程语言,并且对基础神经模型不可知。我们证明,Doccoder始终改善NL-TO-代码模型:DOCCODER在新的Bash数据集TLDR上的强基准比强基础高11倍;在受欢迎的Python Conala基准中,Doccoder在强大的基线上提高了1.65 BLEU。
translated by 谷歌翻译
表中的信息可能是文本的重要补充,使基于表的问题答案(QA)具有巨大的价值。处理表的内在复杂性通常会增加模型设计和数据注释的额外负担。在本文中,我们旨在以最少的注释工作开发一个简单的基于表的质量检查模型。由于基于表的质量检查需要问题和表之间的对齐方式以及在多个表元素上执行复杂推理的能力,因此我们提出了一种杂食性的预读方法,该方法既可以消耗自然数据,又提出了合成数据,以使模型具有这些各自的能力。具体而言,鉴于可免费获得的表,我们利用检索将它们与相关的自然句子配对,以进行掩盖预处理,并通过将SQL从表中进行转换为QA损失进行预处理而合成NL问题。我们在几次和完整的设置中都进行了广泛的实验,结果清楚地证明了模型omnitab的优势,最好的多任务方法分别实现了16.2%和2.7%的绝对增益,在128次和完整的设置中也获得了2.7%建立有关Wickitable Questions的最新最新。详细的消融和分析揭示了自然和合成数据的不同特征,从而阐明了杂食性预处理的未来方向。可以在https://github.com/jzbjyb/omnitab上获得代码,预读数据和预算模型。
translated by 谷歌翻译
Recent work has presented intriguing results examining the knowledge contained in language models (LM) by having the LM fill in the blanks of prompts such as "Obama is a by profession". These prompts are usually manually created, and quite possibly suboptimal; another prompt such as "Obama worked as a " may result in more accurately predicting the correct profession. Because of this, given an inappropriate prompt, we might fail to retrieve facts that the LM does know, and thus any given prompt only provides a lower bound estimate of the knowledge contained in an LM. In this paper, we attempt to more accurately estimate the knowledge contained in LMs by automatically discovering better prompts to use in this querying process. Specifically, we propose mining-based and paraphrasing-based methods to automatically generate high-quality and diverse prompts, as well as ensemble methods to combine answers from different prompts. Extensive experiments on the LAMA benchmark for extracting relational knowledge from LMs demonstrate that our methods can improve accuracy from 31.1% to 39.6%, providing a tighter lower bound on what LMs know. We have released the code and the resulting LM Prompt And Query Archive (LPAQA) at https://github. com/jzbjyb/LPAQA.1 Some models we use in this paper, e.g. BERT (Devlin et al., 2019), are bi-directional, and do not directly define probability distribution over text, which is the underlying definition of an LM. Nonetheless, we call them LMs for simplicity.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
We consider infinite horizon Markov decision processes (MDPs) with fast-slow structure, meaning that certain parts of the state space move "fast" (and in a sense, are more influential) while other parts transition more "slowly." Such structure is common in real-world problems where sequential decisions need to be made at high frequencies, yet information that varies at a slower timescale also influences the optimal policy. Examples include: (1) service allocation for a multi-class queue with (slowly varying) stochastic costs, (2) a restless multi-armed bandit with an environmental state, and (3) energy demand response, where both day-ahead and real-time prices play a role in the firm's revenue. Models that fully capture these problems often result in MDPs with large state spaces and large effective time horizons (due to frequent decisions), rendering them computationally intractable. We propose an approximate dynamic programming algorithmic framework based on the idea of "freezing" the slow states, solving a set of simpler finite-horizon MDPs (the lower-level MDPs), and applying value iteration (VI) to an auxiliary MDP that transitions on a slower timescale (the upper-level MDP). We also extend the technique to a function approximation setting, where a feature-based linear architecture is used. On the theoretical side, we analyze the regret incurred by each variant of our frozen-state approach. Finally, we give empirical evidence that the frozen-state approach generates effective policies using just a fraction of the computational cost, while illustrating that simply omitting slow states from the decision modeling is often not a viable heuristic.
translated by 谷歌翻译